-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RUNTIME] Simple NDArray container API in c++ #1418
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks like a great change and will be very useful in Relay. Just put a couple clarifying questions on the PR.
include/tvm/runtime/ndarray.h
Outdated
* TVMSynchronize is necessary. | ||
*/ | ||
inline void copyto(DLTensor* other); | ||
inline void copyto(const NDArray& other); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we not use Google style for these? Just want to clarify for example CopyTo
or Copyto
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am debating this. One exception rule here, is that for user facing API, we try to make it as consistent as possible with numpy. Which means function like empty is used in here. But for CopyTo, I think CamelCase is also good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switched to CamelCase
*/ | ||
std::vector<int64_t> shape_; | ||
/*! \brief The internal array object */ | ||
std::atomic<int> ref_counter_{0}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we expect to share these across threads without synchronization? Atomic ref-counts are pretty expensive compared to non-atomic ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might need this for multi-threaded purposes. For most cases, overhead of atomic is fine compared to overhead of other things(computing, python and kernel launch)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
include/tvm/runtime/ndarray.h
Outdated
std::atomic<int> ref_counter_{0}; | ||
}; | ||
|
||
// implementations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improve comment?
@@ -394,8 +409,9 @@ class TVMRetValue : public TVMPODValue_ { | |||
using TVMPODValue_::operator int; | |||
using TVMPODValue_::operator bool; | |||
using TVMPODValue_::operator void*; | |||
using TVMPODValue_::operator TVMArray*; | |||
using TVMPODValue_::operator DLTensor*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we pass the underlying pointer instead of the container?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We store the by pointer. But from user's perspective, they pass in containers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see now, looking at Python code more carefully.
} | ||
delete ptr; | ||
} | ||
// Deleter for NDArray converted from DLPack |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to understand, this deleter is used from data which is passed from external DLPack arrays not allocated inside of TVM? Could we maybe improve comment here? we could borrow from text of PR, would probably help for future readers of the this code.
src/runtime/ndarray.cc
Outdated
} | ||
delete ptr; | ||
} | ||
// local create function that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we improve the comment to be a full sentences, I can do another pass later if you don't have cycles.
i.e "Local create function which allocates tensor metadata but does not allocate space for the data."
include/tvm/runtime/ndarray.h
Outdated
DLDataType dtype, | ||
DLContext ctx); | ||
/*! | ||
* \brief Create a NDArray backed by dlpack tensor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create a NDArray backed by a dlpack tensor.?
@jroesch I addressed all of your comments, can you take another look? |
@tqchen seems good to me 👍 |
Previously TVMArray simply allocates a DLTensor which is a plain C type. This PR change that into a NDArray, which is reference counted and can be used as container in c++.
The TVMArray now is backed by the C++ NDArray::Container, and can pass back and forth in PackedFunc. The python NDArray argument is now passed as NDArray type.
The callee can also directly take out DLTensor* when NDArray is passed, which is compatible with the previous behavior. Moreover, the API allows the callee to take out NDArray from TVMArgs and capture that in the function(by automatically increasing ref count). PackedFunc can now also return NDArray.
The NDArray::Container* is DLTensor* compatible, and can be directly viewed as DLTensor*.
Known Limitations
NDArray is a reference counted type and cannot pass RPC boundary. When NDArray is passed as an argument as a remote function, the argument is downgraded to DLTensor*, so passing NDArray as argument still work if the remote function modifies the data content. Remote function cannot return NDArray.